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© A system and method are disclosed for reducing 
perplexity in a speech recognition system based 
upon determined geographic location. In a mobile 
speech recognition system which processes input 
frames of speech against stored templates repre- 
senting speech, a core library of speech templates is 
created and stored representing a basic vocabulary 
of speech. Multiple location-specific libraries of 
speech templates are also created and stored, each 
library containing speech templates representing a 
specialized vocabulary for a specific geographic lo- 
cation. The geographic location of the mobile speech 
recognition system is then periodically determined 
utilizing a cellular telephone system, a geoposition- 
ing satellite system or other similar systems, and a 
particular one of the location-specific libraries of 
speech templates is identified for the current location 
of the system. Input frames of speech are then 
processed against the combination of the core li- 
brary and the particular location-specific library to 
greatly enhance the accuracy and efficiency of 
speech recognition by the system. Each location- 



specific library preferably includes speech templates 
representative of location place names, proper 
names, and business establishments within a spe- 
cific geographic location. 
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The present, invention relates in general to 
speech recognition systems and in particular to a 
system and method for enhancing speech recogni- 
tion accuracy in a mobile speech recognition sys- 
tem. 

Speech recognition is well known in the prior 

art Thft rftrnnnitinn nf isolator! wnrHs frnm a nivAn 

vocabulary for a known speaker is perhaps the 
simplest type of speech recognition and this type 
of speech recognition has been known for some 
time. Words within the vocabulary to be recognized 
are typically prestored as individual templates, 
each template representing the sound pattern for a 
word in the vocabulary. When ah isolated word is 
spoken, the system merely compares the word to 
each individual template which represents the vo- 
cabulary. This Technique is commoniy referred to 
as whole-word template matching. Many successful 
speech recognition systems use this technique with 
dynamic programming to cope with nonlinear time 
scale variations between the spoken word and the 
prestored template. 

Of greater difficulty is the recognition of con- 
tinuous speech or speech which contains proper 
names or place names. Continuous speech, or 
connected words, have been recognized in the 
prior art utilizing multiple path dynamic program- 
ming. One example of such a system is proposed 
in "Two Level DP Matching A Dynamic Program- 
ming Based Pattern Matching Algorithm For Con- 
nected Word Recognition" H. Sakoe, IEEE Trans- 
actions on Acoustics Speech and Signal Process- 
ing, Volume ASSP-27, No. 6, pages 588-595, De- 
cember 1 979. This paper suggests a two-pass dy- 
namic programming algorithm to find a sequence 
of word templates which best matches the whole 
input pattern. Each pass through the system gen- 
erates a score which indicates the similarity be- 
tween every template matched against every possi- 
ble portionjof the input pattern. In a second pass 
the score is then utilized to find the best sequence 
of templates corresponding to the whole input pat- 
tern. 

United States Patent No. 5,040,127 proposes a 
continuous speech recognition system which pro- 
cesses continuous speech by comparing input 
frames against prestored templates which repre- 
sent speech and then creating links between 
records in a linked network for each template under 
consideration as a potentially recognized individual 
word. The linked records include ancestor and de- 
scended link records which are stored as indexed 
data sets with each data set including a symbol 
representing a template, a sequence indicator re- 
presenting the relative time the link record was 
stored and a pointer indicating a link record in the 
network from which it descends. 



The recognition of proper names represents an 
increase in so-called "perplexity" for speech rec- 
ognition systems and this difficulty has been re- 
cently recognized in U.S. Patent No. 5,212,730. 
5 This patent performs name recognition utilizing 
text-derived recognition models for recognizing the 

cnnlr^n ronrlitinn nf nrr>nor namoc uvhinh aro c-i io_ 

ceptible to multiple pronunciations. A name rec- 
ognition technique set forth within this patent in- 
fo volves entering the name-text into a text database 
which is accessed by designating the name-text 
and thereafter constructing a selected number of 
text-derived recognition models from the name-text 
wherein each text-derived recognition model repre- 
ss sents at least one pronunciation of the name. 
Thereafter, for each attempted access to the text 
database oy a spoken name input the text 
database is compared with the spoken name input 
to determine if a match may be accomplished. 
20 U.S. Patent No. 5,202,952 discloses a large- 

vocabulary continuous-speech prefiltering and pro- 
cessing system which recognizes speech by con- 
verting the utterances to frame data sets wherein 
each frame data set is smoothed to generate a 
25 smooth frame model over a predetermined number 
of frames. Clusters of word models which are 
acoustically similar over a succession of frame 
periods are designated as a resident vocabulary 
and a cluster score is then generated by the sys- 
30 tern which includes the likelihood of the smooth 
frames evaluated utilizing a probability model for 
the cluster against which the smoothed frame 
model is being compared. 

Each of these systems recognizes that suc- 
35 cessful speech recognition requires a reduction in 
the perplexity of a continuous-speech utterance. 
Publications which address this problem are "Per- 
plexity-A Measure of Difficulty of Speech Recogni- 
tion Tasks," Journal of the Acoustical Society of 
40 America, Volume 62, Supplement No. 1, page S- 
63, Fall 1977, and the "Continuous Speech Rec- 
ognition Statistical Methods" in the Handbook of 
Statistics Volume 2: Classification, Pattern Recogni- 
tion and Reduction of Dimensionality, pages 549- 
45 573, North-Holland Publishing Company, 1982. 

In view of the above, it is apparent that suc- 
cessful speech recognition requires an enhanced 
ability to distinguish between large numbers of like 
sounding words, a problem which is particularly 
so difficult with proper names, place names and num- 
bers. 

It is therefore an object of the present invention 
to provide a speech recognition system, and meth- 
od of operation of such a system, with enhanced 
55 speech recognition accuracy and efficiency. 

Accordingly the present invention provides a 
mobile speech recognition system comprising: an 
audio input means for receiving input speech; a 
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storage means; a core library of speech templates 
stored within said storage means representing a 
basic vocabulary of speech; a plurality of location- 
specific libraries of speech templates stored within 
said storage means, each representing a special- 
ized vocabulary for a particular geographic location; 
location determination means for determining a 
geographic location of said mobile speech recogni- 
tion system; library selection means coupled to 
said storage means and said location determination 
means for selecting a particular one of said plural- 
ity of location-specific libraries in response to a 
determination of said geographic location of said 
mobile speech recognition system; and speech 
processor means coupled to said audio input 
means and said storage means for processing in- 
put speech frames against said core library and 
said particular one of said plurality of location- 
specific libraries. 

Viewed from a second aspect the present in- 
vention provides a method of operating a mobile 
speech recognition system to process input frames 
of speech against stored templates representing 
speech, said method comprising the steps of: stor- 
ing in a memory a core library of speech templates 
representing a basic vocabulary of speech; storing 
a plurality of location-specific libraries of speech 
templates, each representing a specialized vocabu- 
lary for a particular geographic location; determin- 
ing a geographic location of said mobile speech 
recognition system; associating said core library of 
speech templates with a particular one of said 
plurality of location-specific libraries of speech tem- 
plates in response to said determination of said 
geographic location of said mobile speech recogni- 
tion system; and employing a processor to process 
input frames of speech against said core library 
and associated location-specific library. 

From the above it is apparent that the present 
invention_ provides a system and method for en- 
hanced speech recognition in a mobile system 
utilizing location-specific libraries of speech tem- 
plates and an identification of the system location. 

The system and method of the preferred em- 
bodiment of the invention reduce perplexity in a 
speech recognition system based upon determined 
geographic location. In a mobile speech recognition 
system according to the preferred embodiment in- 
put frames of speech are processed against stored 
templates representing speech, a core library of 
speech templates being created and stored repre- 
senting a basic vocabulary of speech. Multiple lo- 
cation-specific libraries of speech templates are 
also created and stored, each library containing 
speech templates representing a specialized vo- 
cabulary for a specific geographic location. The 
geographic location of the mobile speech recogni- 
tion system is then periodically determined utilizing 



a cellular telephone system, a geopositioning sat- 
ellite system or other similar systems and a par- 
ticular one of the location-specific libraries of 
speech templates is identified for the current loca- 
s tion of the system. Input frames of speech are then 
processed against the combination of the core li- 
brary and the particular location-specific library to 
greatly enhance the efficiency of speech recogni- 
tion by the system. Each location-specific library 
10 preferably includes speech templates representa- 
tive of location place names, proper names, and 
business establishments within a specific geo- 
graphic location. 

The present invention will be described further, 
75 by way of example only, with reference to a pre- 
ferred embodiment thereof as illustrated in the ac- 
companying drawings, in which: 

Figure 1 is a pictorial representation of a mobile 
speech recognition system which may be uti- 
20 lized to implement the system and method of 
the present invention; 

Figure 2 is a high-level block diagram of the 
mobile speech recognition system of Figure 1; 
and 

25 Figure 3 is a high-level logic flowchart illustrat- 
ing a process for implementing the method of 
the present invention. 

With reference now to the figures and in par- 
ticular with reference to Figure 1 t there is depicted 

30 a pictorial representation of a mobile speech rec- 
ognition system 12 which may be utilized to imple- 
ment the system and method of the preferred 
embodiment of the present invention. As illustrated, 
mobile speech recognition system 12 may be im- 

35 plemented utilizing any suitably programmed porta- 
ble computer, such as a so-called "notebook com- 
puter." As depicted, mobile speech recognition 
system 12 may include a keyboard 14, a display 
16 and a display screen 18. Additionally, as will be 

40 explained in greater detail herein, mobile speech 
recognition system 12 may also include an antenna 
20 which may be utilized to electronically deter- 
mine the specific geographic location of mobile 
speech recognition system 12 in response to de- 

45 tection of a verbal utterance. 

Also depicted within Figure 1 is an audio input 
device which is coupled to mobile speech recogni- 
tion system 12. Microphone 22 serves as an audio 
input device for mobile speech recognition system 

so 12 and, in a manner well known to those having 
ordinary skill in the speech recognition art, may be 
utilized to capture verbal utterances spoken by a 
user of mobile speech recognition system 12, in 
order to provide additional information, perform 

55 specific functions or otherwise respond to verbal 
commands. 

Mobile speech recognition system 12 is char- 
. acterized as mobile within this specification and it 



3 



5 



EP t O 661 688 A2 



6 



is anticipated that such systems will find applica- 
tion within mobile platforms, such as automobiles, 
police cars, fire trucks, ambulances, and personal 
digital assistant (PDAs) which may be carried on 
the person of a user. Upon reference to this disclo- 
sure, those skilled in the art will appreciate that 
snAftnh recognition on a mnbilft platform represents 
a definite increase in the likely perplexity of the 
speech recognition problem due to the necessity 
that the system recognize street names, street ad- 
dresses, restaurant names, business names and 
other proper names associated with a specific geo- 
graphic location at which the mobile system may 
be located. 

In order to solve this problem mobile speech 
recognition system 12 preferably includes a device 
for determining the geographic iocaiion of the sys- 
tem. This may be accomplished utilizing many 
different techniques including the utilization of a 
geopositioning system, such as the Global Posi- 
tioning Satellite System. Thus, radio signals from 
satellite 28 may be received by mobile speech 
recognition system 12 at antenna 20 and may be 
utilized to determine the specific geographic loca- 
tion for mobile speech recognition system 12 at the 
time of a particular spoken utterance. Similarly, 
radio signals from a cellular phone network, such 
as cellular transmission towers 24 and 26, may 
also be utilized to accurately and efficiently deter- 
mine the geographic location of mobile speech 
recognition system 12. Additionally, while not illus- 
trated, those skilled in the art will appreciate that 
special purpose radio signals, inertia! guidance 
systems, or other similar electronic measures may 
be utilized to determine the geographic location of 
mobile speech recognition system 12 in a manner 
that is well within the current scope of these tech- 
nologies. Additionally, a user may simply enter an 
indication of geographic location into mobile 
speech recognition system 12 utilizing keyboard 
14. 

Referring now to Figure 2, there is depicted a 
high level block diagram of the mobile speech 
recognition system 12 of Figure 1, which illustrates 
the manner in which this geographic location deter- 
mination may be utilized to decrease the perplexity 
of speech recognition. As illustrated within Figure 
2, a memory 36 is provided within mobile speech 
recognition system 12 which includes a core library 
38 of speech templates which represent a basic 
vocabulary of speech. Similarly, multiple location- 
specific libraries 40 are also stored within memory 
36. Each location-specific library 40 includes tem- 
plates which are representative of a specialized 
vocabulary for a particular geographic location. For 
example, each location-specific library of speech 
templates may include a series of speech tem- 
plates representative of street names within that 



geographic location, business establishments within 
that geographic location or other proper names 
which are germane to a selected geographic loca- 
tion. 

s Thus, each time a speech utterance is de- 

tected at microphone 22, that utterance may be 

suitably rnnvftrterl fnr nrnrpssinn ntili^inn analnn- 

to-digital converter 34 and coupled to processor 
32. Processor 32 then utilizes location determina- 

10 tion circuitry 30 to identify a specific geographic 
location for mobile speech recognition system 12 
at the time of the spoken utterance. This may be 
accomplished, as described above, utilizing global 
positioning satellite systems or radio frequency sig- 

75 nals from cellular telephone systems which are 
received at antenna 20, or by the simple expedient 
of requiring the user to periodicaiiy enter at key- 
board 14 an identification for a specific geographic 
location. 

20 Next, the output of location determination cir- 

cuitry 30 is utilized by processor 32 to select a 
particular one of the multiple location-specific li- 
braries 40 contained within memory 36. The input 
frame of speech data is then compared to a com- 

25 posite library which is comprised of core library 38 
and a particular one of the location-specific libraries 
40. In this manner, the perplexity of speech rec- 
ognition in a mobile speech recognition system 
may be greatly reduced, thereby enhancing the 

30 accuracy and efficiency of speech recognition with- 
in the system. 

As discussed above with respect to previous 
attempts at speech recognition, the templates 
against which input speech is processed may com- 

35 prise templates representing individual words, 
phrases or portions of words. As utilized herein, the 
term "template" shall mean any stored digital re- 
presentation which may be utilized by processor 32 
to identify an unknown speech utterance. 

40 Finally, with reference to Figure 3, there is 

depicted a high-level logic flowchart which illus- 
trates a process for implementing the method of 
the preferred embodiment of the present invention. 
As depicted, this process begins at block 60 and 

45 thereafter passes to block 62. Block 62 illustrates a 
determination of whether or not a verbal utterance 
has been detected. If not, the process merely it- 
erates until such time as an utterance has been 
detected. However, once a verbal utterance has 

so been detected, the process passes to block 64. 

Block 64 illustrates a determination of the cur- 
rent geographic location of mobile speech recogni- 
tion system 12. As discussed above, this may be 
accomplished utilizing the global positioning sat- 

55 ellite system, cellular telephone systems, or other 
specialized radio frequency signals or inertial navi- 
gation techniques. Thereafter, the geographic loca- 
tion determination is utilized to select a particular 
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location-specific library, as depicted at block 66. 

Next, as depicted at block 68, the input utter- 
ance is processed against the core library of basic 
vocabulary words and a particular location-specific 
library which is associated with the determined 
geographic location of mobile speech recognition 
system 12. Thereafter, the process passes to block 
70. Block 70 illustrates a determination of whether 
or not the verbal utterance has been recognized. If 
not, the process passes to block 72 which depicts 
the generation of^an error message, a verbalized 
command urging the user to repeat the utterance, 
or other similar techniques for resolving the failure 
of the system to recognize the verbal utterance. 
After generating such a message, the process then 
passes to block 76 and returns to await the detec- 
tion of a subsequent verbal utterance. 

Referring again to block 70, in the event the 
utterance has been recognized, the process passes 
to block 74. Block 74 illustrates the processing of 
that utterance. Those skilled in the art will appre- 
ciate that a verbal utterance may be processed to 
generate information which is presented to the 
user, to control the activity of some portion of the 
system or to store data for future reference. There- 
after, as above, the process passes to block 76 and 
returns to await the detection of a subsequent 
verbal utterance. 

Upon reference to the foregoing, those skilled 
in the art will appreciate that by determining the 
geographic location of a mobile speech recognition 
system and thereafter utilizing a location-specific 
library of speech templates, the method and sys- 
tem of the preferred embodiment of the present 
invention greatly reduces the possible perplexity of 
a speech recognition system and concomitantly 
enhances the accuracy and efficiency of speech 
recognition. 

Claims _ 

1. A mobile speech, recognition system compris- 
ing: 

an audio input means (22) for receiving 
input speech; 

a storage means (36); 

a core library (38) of speech templates 
stored within said storage means (36) repre- 
senting a basic vocabulary of speech; 

a plurality of location-specific libraries (40) 
of speech templates stored within said storage 
means (36), each representing a specialized 
vocabulary for a particular geographic location; 

location determination means (30) for de- 
termining a geographic location of said mobile 
speech recognition system; 

library selection means (32) coupled to 
said storage means (36) and said location de- 



termination means (30) for selecting a particu- 
lar one of said plurality of location-specific 
libraries in response to a determination of said 
geographic location of said mobile speech rec- 

5 ognition system; and 

speech processor means (32) coupled to 
said audio input means (22) and said storage 
means (36) for processing input speech frames 
against said core library and said particular 

10 one of said plurality of location-specific librar- 

ies. 

2. A mobile speech recognition system as 
claimed in claim 1, wherein said audio input 

75 means (22) comprises a microphone. 

3. A mobile speech recognition system as 
claimed in claim 1 or claim 2, wherein said 
location determination means (30) comprises a 

20 cellular telephone transceiver. 

4. A mobile speech recognition system as 
claimed in claim 1 or claim 2, wherein said 
location determination means (30) comprises a 

25 global positioning satellite receiver. 

5. A mobile speech recognition system as 
claimed in any preceding claim, wherein said 
speech processor means (32) includes an ana- 

30 log-to-digital converter (34). 

6. A mobile speech recognition system as 
claimed in any preceding claim, wherein said 
speech processor means (32) forms part of a 

35 personal computer. 

7. A mobile speech recognition system as 
claimed in any preceding claim, wherein each 
of said plurality of location-specific libraries of 

40 speech templates comprises a plurality of tem- 

plates representative of a plurality of location 
place names. 

8. A method of operating a mobile speech rec- 
45 ognition system to process input frames of 

speech against stored templates representing 
speech, said method comprising the steps of: 

storing in a memory (36) a core library of 
speech templates representing a basic vocabu- 
50 lary of speech; 

storing a plurality of location-specific librar- 
ies of speech templates, each representing a 
specialized vocabulary for a particular geo- 
graphic location; 
55 determining (64) a geographic location of 

said mobile speech recognition system; 

associating (66) said core library of speech 
templates with a particular one of said plurality 
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of location-specific libraries of speech tem- 
plates in response to said determination of 
said geographic location of said mobile speech 
recognition system; and 

employing a processor (32) to process 5 
(68) input frames of speech against said core 
lihrary anrl assnniatfid I rir.atinn- specific library. 

9. A method as claimed claim 8, wherein said 
step of determining (64) a geographic location 10 
of said mobile speech recognition system 
comprises the step of utilizing a cellular tele- 
phone system to determine a geographic loca- 
tion of said mobile speech recognition system. 

75 

10. A method as claimed in claim 8, wherein said 
step of determining a geographic iocation of 
said mobile speech recognition system com- 
prises the step of utilizing a geopositioning 
satellite system receiver to determine a geo- 20 
graphic location of said mobile speech rec- 
ognition system. 
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